Search CORE

10 research outputs found

High-Level Hardware Feature Extractionfor GPU Performance Prediction of Stencils

Author: Chen Tianqi
Henriksen Troels
Lee Seyong
Leissa Roland
McDonell Trevor L.
Steuwer Michel
Tartara Michele
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/02/2020
Field of study

Crossref

Edinburgh Research Explorer

Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs

Author: Holk Eric
McDonell Trevor L.
Newton Ryan R.
Svensson Bo Joel
Vollmer Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

High-level domain-specific languages for array processing on the GPU are increasingly common, but they typically only run on a single GPU. As computational power is distributed across more devices, languages must target multiple devices simultaneously. To this end, we present a compositional translation that fissions data-parallel programs in the Accelerate language, allowing subsequent compiler and runtime stages to map computations onto multiple devices for improved performance---even programs that begin as a single data-parallel kernel

CiteSeerX

Kent Academic Repository

Achieving High-Performance the Functional Way: A Functional Pearl on Expressing High-Performance Optimizations as Rewrite Strategies

Author: Boyle James M
Chakravarty Manuel M. T.
Chen Tianqi
Collins Alexander
Delahaye David
Hall Mary
Henriksen Troels
Jones Simon Peyton
Lattner Chris
McDonell Trevor L.
Ragan-Kelley Jonathan
Steuwer Michel
Steuwer Michel
Svensson Joel
Visser Eelco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing functionality and optimizations. This results in a portability nightmare that is particularly problematic given the accelerating trend towards specialized hardware devices to further increase efficiency. Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process. Using a high-level - often functional - language, programmers focus on describing functionality in a declarative way. In some systems such as Halide or TVM, a separate schedule specifies how the program should be optimized. Unfortunately, these schedules are not written in well-defined programming languages. Instead, they are implemented as a set of ad-hoc predefined APIs that the compiler writers have exposed. In this functional pearl, we show how to employ functional programming techniques to solve this challenge with elegance. We present two functional languages that work together - each addressing a separate concern. RISE is a functional language for expressing computations using well known functional data-parallel patterns. ELEVATE is a functional language for describing optimization strategies. A high-level RISE program is transformed into a low-level form using optimization strategies written in ELEVATE . From the rewritten low-level program high-performance parallel code is automatically generated. In contrast to existing high-performance domain-specific systems with scheduling APIs, in our approach programmers are not restricted to a set of built-in operations and optimizations but freely define their own computational patterns in RISE and optimization strategies in ELEVATE in a composable and reusable way. We show how our holistic functional approach achieves competitive performance with the state-of-the-art imperative systems Halide and TVM

Crossref

Edinburgh Research Explorer

Enlighten

Embedded pattern matching

Author: Keller Gabriele
McDonell Trevor L.
Meredith Joshua D.
Publication venue
Publication date: 01/01/2022
Field of study

Haskell is a popular choice for hosting deeply embedded languages. A recurring challenge for these embeddings is how to seamlessly integrate user defined algebraic data types. In particular, one important, convenient, and expressive feature for creating and inspecting data—pattern matching—is not directly available on embedded terms. We present a novel technique, embedded pattern matching, which enables a natural and user friendly embedding of user defined algebraic data types into the embedded language, and allows programmers to pattern match on terms in the embedded language in much the same way they would in the host language

Utrecht University Repository

Embedding Foreign Code

Author: Gabriele Keller
Manuel M. T. Chakravarty
Robert Clifton-everest
Trevor L. Mcdonell
Publication venue
Publication date: 03/09/2014
Field of study

Abstract. Special purpose embedded languages facilitate generating high-performance code from purely functional high-level code; for example, we want to program highly parallel GPUs without the usual high barrier to entry and the time-consuming development process. We previously demonstrated the feasibility of a skeleton-based, generative approach to compiling such embedded languages. In this paper, we (a) describe our solution to some of the practical problems with skeleton-based code generation and (b) introduce our approach to enabling interoperability with native code. In particular, we show, in the context of a functional embedded language for GPU programming, how template meta programming simplifies code generation and optimisation. Furthermore, we present our design for a foreign function interface for an embedded language.

CiteSeerX

Embedded pattern matching

Author: Keller Gabriele
McDonell Trevor L.
Meredith Joshua D.
Software Technology
Sub Software Technology
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2022
Field of study

Artifact for Euro-Par 2020 paper Accelerating Nested Data Parallelism: Preserving Regularity

Author: de Wolff Ivo Gabe
Keller Gabriele K.
McDonell Trevor L.
Software Engineering and Technology
Utrecht University
van den Haak Lars
Publication venue: Figshare
Publication date: 03/07/2020
Field of study

This artifact is concerned with Section 5 (Evaluation) of the paper "Accelerating Nested Data Parallelism: Preserving Regularity". It runs benchmarks of a nested quicksort and a nested fourier transformation in Accelerate comparing the results with Futhark. Both Accelerate and Futhark are data parallel languages. To run the benchmarks, a Nvidia GPU is needed, it is made to run on Ubuntu. The benchmarks can be run with Docker or the dependency can be manually installed and runs bash scripts

Pure OAI Repository

Optimising purely functional GPU programs

Author: Ben Lippmeier
Chatterjee S.
Claessen K.
Gabriele Keller
Manuel M.T. Chakravarty
Peyton Jones S.
Sengupta S.
Trevor L. McDonell
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Precise reasoning with structured time, structured heaps, and collective operations

Author: Bachmann Olaf
Baghdadi Riyadh
Benabderrahmane Mohamed-Walid
Bergstra Jan A.
Beyer Dirk
Beyer Dirk
Brown Kevin J.
Calcagno Cristiano
Dijkstra Edsger Wybe
Distefano Dino
Farzan Azadeh
Fourier Joseph
Gurfinkel Arie
Henzinger Thomas A.
Jesse
Kincaid Zachary
Kovács Laura
McDonell Trevor L.
Mendis Charith
Owens Scott
Ragan-Kelley Jonathan
Reynolds John C.
Rompf Tiark
Safeer Ahmad Maaz Bin
Schwartz Jack
Siek Jeremy G.
Sujeeth A. K.
Svensson Bo Joel
Tate Ross
Van Engelen Robert A.
Vollmer Michael
Yakdan Khaled
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref